Search CORE

70 research outputs found

Scientific Workflow Applications on Amazon EC2

Author: Berman Benjamin P.
Berriman Bruce
Deelman Ewa
Juve Gideon
Maechling Phil
Mehta Gaurang
Vahi Karan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2009
Field of study

The proliferation of commercial cloud computing providers has generated significant interest in the scientific computing community. Much recent research has attempted to determine the benefits and drawbacks of cloud computing for scientific applications. Although clouds have many attractive features, such as virtualization, on-demand provisioning, and "pay as you go" usage-based pricing, it is not clear whether they are able to deliver the performance required for scientific applications at a reasonable price. In this paper we examine the performance and cost of clouds from the perspective of scientific workflow applications. We use three characteristic workflows to compare the performance of a commercial cloud with that of a typical HPC system, and we analyze the various costs associated with running those workflows in the cloud. We find that the performance of clouds is not unreasonable given the hardware resources provided, and that performance comparable to HPC systems can be achieved given similar resources. We also find that the cost of running workflows on a commercial cloud can be reduced by storing data in the cloud rather than transferring it from outside

arXiv.org e-Print Archive

Crossref

Caltech Authors

Provenance: The Bridge Between Experiments and Data

Author: Ewa Deelman
Gaurang Mehta
Karan Vahi
Luc Moreau
Paul Groth
Simon Miles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Workflow task clustering for best effort systems with

Author: Bruce Berriman
Daniel S Katz
Ewa Deelman
Gaurang Mehta
Gurmeet Singh
John Good
Karan Vahi
Mei-Hui Su
Publication venue: ACM,
Publication date: 01/01/2008
Field of study

ABSTRACT Many scientific workflows are composed of fine computational granularity tasks, yet they are composed of thousands of them and are data intensive in nature, thus requiring resources such as the TeraGrid to execute efficiently. In order to improve the performance of such applications, we often employ task clustering techniques to increase the computational granularity of workflow tasks. The goal is to minimize the completion time of the workflow by reducing the impact of queue wait times. In this paper, we examine the performance impact of the clustering techniques using the Pegasus workflow management system. Experiments performed using an astronomy workflow on the NCSA TeraGrid cluster show that clustering can achieve a significant reduction in the workflow completion time (upto 97%)

CiteSeerX

Giving RSEs a Larger Stage through the Better Scientific Software Fellowship

Author: Arora Ritu
Beattie Keith
Bernholdt David E.
Bratt Sarah E.
Godoy William F.
Katz Daniel S.
Laguna Ignacio
Maji Amiya K.
Mudafort Rafael M.
Rouson Damian
Rubio-González Cindy
Sukhija Nitin
Thakur Addi Malviya
Vahi Karan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/11/2022
Field of study

The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. BSSwF's vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software production and sustainability. Over the last five years, many fellowship recipients and honorable mentions have identified as research software engineers (RSEs). This paper provides case studies from several of the program's participants to illustrate some of the diverse ways BSSwF has benefited both the RSE and scientific communities. In an environment where the contributions of RSEs are too often undervalued, we believe that programs such as BSSwF can be a valuable means to recognize and encourage community members to step outside of their regular commitments and expand on their work, collaborations and ideas for a larger audience.Comment: submitted to Computing in Science & Engineering (CiSE), Special Issue on the Future of Research Software Engineers in the U

arXiv.org e-Print Archive

Optimizing Workflow Data Footprint

Author: Berriman G. Bruce
Blackburn Kent
Brown Duncan
Deelman Ewa
Fairhurst Stephen
Good John
Katz Daniel S.
Mehta Gaurang
Meyers David
Ramakrishnan Arun
Sakellariou Rizos
Singh Gurmeet
Vahi Karan
Zhao Henan
Publication venue: 'IOS Press'
Publication date: 01/01/2007
Field of study

In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific workflows onto distributed resources where the workflows are data-intensive, requiring large amounts of data storage, and the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer needed and we demonstrate that workflows may have to be restructured to reduce the overall data footprint of the workflow. We show the results of our data management and workflow restructuring solutions using a Laser Interferometer Gravitational-Wave Observatory (LIGO) application and an astronomy application, Montage, running on a large-scale production grid-the Open Science Grid. We show that although reducing the data footprint of Montage by 48% can be achieved with dynamic data cleanup techniques, LIGO Scientific Collaboration workflows require additional restructuring to achieve a 56% reduction in data space usage. We also examine the cost of the workflow restructuring in terms of the application's runtime

Novel proposals for FAIR, automated, recommendable, and robust workflows

Author: Abhinit Ishan
Adams Emily K.
Alam Khairul
Chase Brian
Deelman Ewa
Ferreira da Silva Rafael
Filgueira Rosa
Gorenstein Lev
Hudson Stephen
Islam Tanzima
Larson Jeffrey
Lentner Geoffrey
Mandal Anirban
Navarro John-Luke
Nicolae Bogdan
Pouchard Line
Ross Rob
Roy Banani
Rynge Mats
Serebrenik Alexander
Vahi Karan
Wild Stefan
Xin Yufeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/11/2022
Field of study

Funding: This work is partly funded by NSF award OAC-1839900. This material is based upon work supported by the U.S. Department of Energy, Office of Science, under contract number DE-AC02-06CH11357. libEnsemble was developed as part of the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the OLCF at ORNL, which is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC05-00OR22725.Lightning talks of the Workflows in Support of Large-Scale Science (WORKS) workshop are a venue where the workflow community (researchers, developers, and users) can discuss work in progress, emerging technologies and frameworks, and training and education materials. This paper summarizes the WORKS 2022 lightning talks, which cover five broad topics: data integrity of scientific workflows; a machine learning-based recommendation system; a Python toolkit for running dynamic ensembles of simulations; a cross-platform, high-performance computing utility for processing shell commands; and a meta(data) framework for reproducing hybrid workflows.Postprin

University of St. Andrews - Pure

St Andrews Research Repository

Provenance: The Bridge Between Experiments and Data

Author: Deelman Ewa
Groth Paul
Mehta Gaurang
Miles Simon
Moreau Luc
Vahi Karan
Publication venue
Publication date: 01/05/2008
Field of study

Current scientific applications are often structured as workflows and rely on workflow systems to compile abstract experiment designs into enactable workflows that utilise the best available resources. The automation of this step and of the workflow enactment, hides the details of how results have been produced. Knowing how compilation and enactment occurred allows results to be reconnected with the experiment design. We investigate how provenance helps scientists to connect their results with the actual execution that took place, their original experiment and its inputs and parameters

Southampton (e-Prints Soton)

King's Research Portal